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Dual Support Vector Machine 
Roadmap 
@ Embedding Numerous Features: Kernel Models 


Lecture 1: Linear Support Vector Machine 


linear SVM: more robust and 
solvable with quadratic programming 









Lecture 2: Dual Support Vector Machine 
ə Motivation of Dual SVM 

e Lagrange Dual SVM 

ə Solving Dual SVM 

ə Messages behind Dual SVM 





@ Combining Predictive Features: Aggregation Models 
© Distilling Implicit Features: Extraction Models 
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Dual Support Vector Machine Motivation of Dual SVM 


Non-Linear Support Vector Machine Revisited 
Non-Linear Hard-Margin SVM 










min 3w!'w 

b,w 

S.t. Yn(w! Zn +b)>1, i 
(Xn) (2) | w | a QP(Q, p, A, Cc) 


forn=1,2,...,N 
© return b € R & w € R? with 


Qsvu(X) = sign(w! (x) + b) 





e demanded: not many (large-margin), but sophisticated 
boundary (feature transform) 


e QP with d +1 variables and N constraints 
—challenging if d large, or infinite?! :-) 


goal: SVM without dependence on d | 





Dual Support Vector Machine Motivation of Dual SVM 


Todo: SVM ‘without’ d 





Original SVM 
(convex) QP of (convex) QP of 
e d+1 variables e N variables 
e N constraints J e N + 1 constraints 














e introduce some necessary math to help understand 
SVM deeper 

e ‘claim’ some results if details unnecessary 
—like how we ‘claimed’ Hoeffding 





‘Equivalent SVM: based on some 
dual problem of Original SVM 
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Dual Support Vector Machine Motivation of Dual SVM 


Key Tool: Lagrange Multipliers 












Regularization by 
Constrained-Minimizing Ein 


Regularization by 
<> Rie =r 


in E T : À 
min En(w) s.t. wow < C min Eaug(W) = En(w) + N 


e C equivalent to some à > 0 by checking optimality condition 
VEin(w) + Sw =0 


e regularization: view as given parameter instead of C, and 
solve ‘easily’ 


e dual SVM: view \’s as unknown given the constraints, and solve 
them as variables instead 


how many \’s as variables? 
N—one per constraint 
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Dual Support Vector Machine Motivation of Dual SVM 


Starting Point: Constrained to ‘Unconstrained’ 








l a with Lagrange multipliers XK an, 
min iw'w 


oe f L(b,w,a) = 
st. ya(wTzn+b)>1, k 
forn=1,2,...,N iw'w +Y an(1 — Yn(w'2n + b)) 
eS 
objective a constraint 











SS 


SVM = min (3 max £(b,w .2)) = min (co if violate ; $w Tw if feasible 


W all an>0 b,w 


e any ‘violating’ (b, w): max | (o + >>, an(some positive) ) — 00 
all n= 





e any ‘feasible’ (b, w): max (5 + Jn an(all non-positive) ) = 
all Anz 


constraints now hidden in max 
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Motivation of Dual SVM 


Fun Time 


Consider two transformed examples (z4, +1) and (Z2,—1) with z4 =z 
and z2 = —z. What is the Lagrange function £(b, w, œ) of hard-margin 


SVM? 
© jw'ws ay 
© sw ws ay 
© sw ws ay 
© sw ws ay 


Dual Support Vector Machine 






+ ao(1 +w!'z+ b) 
+ ae(1—w'z+ b) 
+ ae(1 +w'z— b) 
b) + a2(1 — w!z-— b) 


1+w’z+b 
1-wz-b 
1+w’z+b 
1—-w’z-b 


—~ ~~ 
CS e 








Dual Support Vector Machine Motivation of Dual SVM 


Fun Time 





























Consider two transformed examples (z1, +1) and (Z2,—1) with z4 =z 
and z2 = —z. What is the Lagrange function £(b, w, œ) of hard-margin 
SVM? 

© iww +a (1+w7z+ b) + 00(1 +w!'z+ b) 

@ iww +a (1-w7z-— b)+ a(1 -wz +b) 

© iww +a(1 +w7z+ b)+ a0(1 + w"z — b) 

© sw wt ai(1 — w’z— b)+ a(1-w'z-— b) 
Reference Answer: 2) 


By definition, | 
i 

L(b,w, a) = zW'w + (1 — yı(w"z; + b)) 

+ a2(1— yə(W"Z2 + b)) 


with (z1, y1) = (z, +1) and (Ze, y2) = (—z, —1). 
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Dual Support Vector Machine Lagrange Dual SVM 


Lagrange Dual Problem 


for any fixed a’ with all a/, > 0, 


min ( max L(b,w .2)) > min L(b,w, a) 
,w 


b,w \all an>0 


because max > any 





for best a’ > 0 on RHS, 


min n( max L(b,w a) > 2 imax min £(b, w, a ) 


bw \all an>0 an'>0 bw 


ue ey 
Lagrange dual problem 


because best is one of any 


Lagrange dual problem: 
‘outer’ maximization of œ on lower bound of original problem | 





Dual Support Vector Machine Lagrange Dual SVM 


Strong Duality of Quadratic Programming 


all an>0 


2 


min ( max £(b,w,<)) > max (min c(b.w, a)) 
b\w \all an>0 b,w 
SS —_— OF 





equiv. to original (primal) SVM 





Lagrange dual 
e ‘>’: weak duality 


e ‘=’: strong duality, true for QP if 
e convex primal 


e feasible primal (true if @-separable) 
e linear constraints 


—called constraint qualification 





exists primal-dual optimal 
solution (b, w, œ) for both sides | 
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Dual Support Vector Machine Lagrange Dual SVM 


Solving Lagrange Dual: Simplifications (1/2) 


max, min 3 iw Fw Yanl — yn(w'z, + b)) 


L(b,w,a) 





e inner problem ‘unconstrained’, at optimal: 
deua 9 = Mand 
no loss of optimality if solving with constraint ya AnYn =Q 





but wait, b can be removed 


min ww + Y” anli — ylw Tzn) - DE) 


ma 
all an>0, a an0 b,w REI 





Dual Support Vector Machine Lagrange Dual SVM 


N 
e aT T 
max min 5w Ww 1— w'z 
e Oa ( bi 2 =F 2 an( Yn n)) 











Solving Lagrange Dual: Simplifications (2/2) 
e inner problem ‘unconstrained’, at optimal: 
AL(b 
o 


wa) 7 _ N 
ED) = O = wi — Eh anynZni 


e no loss of optimality if solving with constraint w = ye QnYnZn 





but wait! 


N 
Faller T 
max ming WoW + An-—W WwW 
all an>0,5°> Ynan=0,W=S> anynZn \ DW 


n=1 


N N 
—5|| > OnVnZnll" T ` Qn 
n=1 n=1 


max 
all an>0,5> Vnan=0,W=>> anYnZn 





Dual Support Vector Machine Lagrange Dual SVM 


KKT Optimality Conditions 


N N 
—5|| 7 a&nyYnznl|? + pe Qn 
n=1 n=1 


max 
all an>0,9> Ynan=0,W=)> anynZn 


if primal-dual optimal (b, w, æa), 
e primal feasible: yp(w’z, + b) > 1 
e dual feasible: an > 0 
e dual-inner optimal: X` ynan = 0; W = So anYn2n 
e primal-inner optimal (at optimal all ‘Lagrange terms’ disappear): 


an(1 — yn(w'Z, + b)) =0 


—called Karush-Kuhn-Tucker (KKT) conditions, necessary for 
optimality [& sufficient here] 





will use KKT to ‘solve’ (b, w) from optimal a 
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Dual Support Vector Machine Lagrange Dual SVM 


Fun Time 


For a single variable w, consider minimizing 4w? subject to two linear 
constraints w > 1 and w < 3. We know that the Lagrange function 
L(w, a) = 4w? + a4(1 — w) + a2(w — 3). Which of the following 
equations that contain œ are among the KKT conditions of the 
optimization problem? 

© a; >O0andas >0 

(2) W = Q1 — Q2 

© a(i = w) = 0 and Q2(w — 3) = 0. 

© all of the above 





Dual Support Vector Machine Lagrange Dual SVM 


Fun Time 


For a single variable w, consider minimizing 4w? subject to two linear 
constraints w > 1 and w < 3. We know that the Lagrange function 
L(w, a) = 4w? + a4(1 — w) + a2(w — 3). Which of the following 
equations that contain œ are among the KKT conditions of the 
optimization problem? 

© a; >O0andas > 0 

© w =a- a2 

© aı(1 — w) = 0 and aọ2(w — 3) = 0. 

© all of the above — 








Reference Answer: (4) 


(4) contains dual-feasible constraints; 
(2) contains dual-inner-optimal constraints; 
(3) contains primal-inner-optimal constraints. 
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Dual Support Vector Machine Solving Dual SVM 


Dual Formulation of Support Vector Machine 


N N 
max -$| So anynznll? + X` an 
n—1 = 


all an>0,9> ynan=0,W=}_ anYnZn 





standard hard-margin SVM dual 


min > D = AnamYnYmzZph ei 2 An 
nm 
subject to D =0 
Qn > 0,forn=1,2,...,N 


(convex) QP of N variables & N + 1 constraints, as promised 





how to solve? yeah, we know QP! :-) 
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Dual Support Vector Machine Solving Dual SVM 


Dual SVM with QP Solver 










optimal a = ? optimal a +— QP(Q,p,A,c) 
i ind T ; peat T 
min z3 20 dy QnAmYnymZpZm min OL Qa +p a 
a P= m=i a 


subjectto aļa > cj, 


>e toni =s 2 E 


n= 
N © Anm = YnYmZ e Zm 
subjectto = S ynan = 0; p= ih 
n=1 as ee: 
an > 0, eee 


a = n-th unit direction 
O;e2—0; ¢,—0 


forn=1,2,...,N 





C> 








note: many solvers treat equality (a>, a<) & 
bound (a,n) constraints specially for numerical stability 
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Dual Support Vector Machine Solving Dual SVM 


Dual SVM with Special QP Solver 


1.7 T 
min za (Opat+p a 


subject to special equality and bound constraints 





© „= YnyYmZhZm, often non-zero 
e if N = 30,000, dense Qp (N by N symmetric) takes > 3G RAM 


e need special solver for 


e not storing whole Q» 
e utilizing special constraints properly 


to scale up to large N 





usually better to use special solver in practice 


Dual Support Vector Machine Solving Dual SVM 


Optimal 












KKT conditions 
if primal-dual optimal (b, w, a), 
e primal feasible: yn(w7 Zn + b) > 1 

e dual feasible: an > 0 

e dual-inner optimal: X` ynan = 0; W = $` anYn2n 

e primal-inner optimal (at optimal all ‘Lagrange terms’ disappear): 









an(1 — yn(w'Z, + b)) = 0 (complementary slackness) 


e optimal a => optimal w? easy above! 
e optimal a => optimal b? a range from primal feasible & 
equality from comp. slackness if one an > 0 = b= yn—W'Zn 


comp. slackness: 
Qn > 0 = on fat boundary (SV!) 
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Dual Support Vector Machine Solving Dual SVM 


Fun Time 


Consider two transformed examples (z1, +1) and (Z2,—1) with z4 =z 
and Zə = —Z. After solving the dual problem of hard-margin SVM, 
assume that the optimal a; and ap are both strictly positive. What is 
the optimal b? 

© -1 

@0 

© 1 


© not certain with the descriptions above 





Dual Support Vector Machine Solving Dual SVM 


Fun Time 


Consider two transformed examples (z1, +1) and (Z2,—1) with z4 =z 
and Zə = —Z. After solving the dual problem of hard-margin SVM, 
assume that the optimal a; and ap are both strictly positive. What is 
the optimal b? 

© -1 

@0 

© 1 


© not certain with the descriptions above 


Reference Answer: (2) 


With the descriptions, at the optimal (b, w), 





b=+1-w'z=-1+w'’z 


That is, w’z = 1 and b = 0. 
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Dual Support Vector Machine Messages behind Dual SVM 


Support Vectors Revisited 
on boundary: ‘locates’ fattest hyperplane; 
others: not needed 
examples with an > 0: on boundary 
call an > 0 examples (Zn, Yn) 
support vectors (candidates) 


SV (positive an) 
C SV candidates (on boundary) 











N 
e only SV needed to compute w: W = $` anYnZn = D> AnYn2Zn 
n=1 SV. 


e only SV needed to compute b: b = yn — w7 zp with any SV (Zp, yn) 






SVM: learn fattest hyperplane 
by identifying support vectors 
with dual optimal solution 
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Dual Support Vector Machine Messages behind Dual SVM 


Representation of Fattest Hyperplane 












N 
WPa = > Bn(Yn2n) 


n=1 


N 
Wsvm = >. Aan(YnZn) 


n=1 





æn from dual solution Bn by # mistake corrections 


w = linear combination of ynZn 


e also true for GD/SGD-based LogReg/LinReg when wo = 0 
e call w ‘represented’ by data 





SVM: represent w by SVs only 
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Dual Support Vector Machine Messages behind Dual SVM 


Summary: Two Forms of Hard-Margin SVM 


Primal Hard-Margin SVM Dual Hard-Margin SVM 


j lot i aD T 
min 5w w min za Qoa-1 a 


b,w 


sub. to  yn(W'Zn+b)>1, st. yla=0:; 
tono = 12 aN a, >O0forn=1,...,N 
e N variables, 


e d +1 variables, 
N constraints _ 
—suitable when d + 1 small 
e physical meaning: locate 
specially-scaled (b, w) 


both eventually result in optimal (b, w) for fattest hyperplane 
Osvu(X) = sign(w! ®(x) + b) | 
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N + 1 simple constraints 
—suitable when N small 


e physical meaning: locate 
SVs (Zn, Yn) & their an 





Dual Support Vector Machine Messages behind Dual SVM 


Are We Done Yet? 
goal: SVM without dependence on d | 







min la Qna —tla 
a 
subject to y!’a=0; 
One OONh = tae... N 
e N variables, N + 1 constraints: no dependence on d? 


© Qn,m = YnYmZ} Zm: inner product in Rd 
—O(d) via naive computation! 


no dependence only if 
avoiding naive computation (next lecture :-)) 


Dual Support Vector Machine Messages behind Dual SVM 


Fun Time 


Consider applying dual hard-margin SVM on N = 5566 examples and 
getting 1126 SVs. Which of the following can be the number of 
examples that are on the fat boundary—that is, SV candidates? 

@ 0 

© 1024 

© 1234 


© 9999 
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Fun Time 


Consider applying dual hard-margin SVM on N = 5566 examples and 
getting 1126 SVs. Which of the following can be the number of 
examples that are on the fat boundary—that is, SV candidates? 

0 0 

© 1024 

© 1234 

@ 9999 








Reference Answer: © 


Because SVs are always on the fat boundary, 


# SVs < # SV candidates < N. 
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Summary 





© Embedding Numerous Features: Kernel Models 


Lecture 2: Dual Support Vector Machine 


ə Motivation of Dual SVM 
want to remove dependence on d 
ə Lagrange Dual SVM 
KKT conditions link primal/dual 
ə Solving Dual SVM 
another QP, better solved with special solver 
ə Messages behind Dual SVM 
SVs represent fattest hyperplane 





e next: computing inner product in Ri efficiently 


@ Combining Predictive Features: Aggregation Models 
© Distilling Implicit Features: Extraction Models 





